-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eurostatv2 pr #1094
base: master
Are you sure you want to change the base?
Eurostatv2 pr #1094
Conversation
Thanks for the PR and getting this import done quickly! A few initial comments:
In the future, all scripts should be accompanied by tests. We can skip it this time due to the time crunch, but I would like us to come back to revisit this in the next month. fyi @manishvats2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! these changes look good, especially the updates on measurement methods and stat vars. i do want us to think through the update on Count_Person_Employed --> dc/nm9hcklgg5zb3 (/cc @ajaits @hareesh-ms)
please use git lfs for the input and output tsv / csv's. we should have examples of these in the repo
@@ -126,8 +129,15 @@ def clean_data(preprocessed_df, output_path): | |||
|
|||
# replace colon with NaN. | |||
clean_df = clean_df.replace(':', '') | |||
|
|||
clean_df['geo'] = 'dcid:nuts/' + clean_df['geo'] | |||
# for ind, geo in enumerate(clean_df['geo']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the commented out code
@@ -37,24 +40,37 @@ | |||
'Count_Person_Employed_NACE/O-Q', | |||
'Count_Person_Employed_NACE/O-U', | |||
'Count_Person_Employed_NACE/R-U', | |||
'Count_Person_Employed', | |||
'dc/nm9hcklgg5zb3', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment that this is "Population: Employed"
@@ -72,7 +82,12 @@ def download_data(self): | |||
"""Downloads raw data from Eurostat website and stores it in instance | |||
data frame. | |||
""" | |||
self.raw_df = pd.read_table(self.DATA_LINK) | |||
# self.raw_df = pd.read_table(self.DATA_LINK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove
self.raw_df = pd.read_table("nama_10r_3gdp.tsv.gz") | ||
self.raw_df = self.raw_df.rename(columns=({'freq,unit,geo\TIME_PERIOD': 'unit,geo\\time'})) | ||
self.raw_df['unit,geo\\time'] = self.raw_df['unit,geo\\time'].str.slice(2) | ||
# return raw_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove
@@ -174,13 +191,16 @@ def generate_tmcf(self): | |||
assert col in ['geo', 'time'] | |||
continue | |||
col_num += 1 | |||
# Amount_EconomicActivity_GrossDomesticProduction_Nominal_AsAFractionOf_Count_Person |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove
@@ -95,5 +110,9 @@ def get_template_mcf(): | |||
|
|||
|
|||
if __name__ == "__main__": | |||
# _DATA_URL = "https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/demo_r_d3dens.tsv.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove
No description provided.